[Day 12] 最佳化演算法中 Optimization Algorithms II

第 12 屆 iThome 鐵人賽

DAY 12

AI & Data

窺探人工智慧與資料科學的面貌系列第 12 篇

12th鐵人賽

幻影狐狸

2020-09-27 17:35:52

1285 瀏覽

分享至

上回介紹了凸函數的定義及一些性質，今天來講一個解最佳化問題很實用的方法：Mirror Descent。

Legendre-Fenchel

我們考慮開凸集合 $\mathcal{D} \subset \mathbb{R}^d$，另外， $\overline{\mathcal{D}}$ 表示 $\mathcal{D}$ 的必集合。 $\left\lVert \cdot \right\rVert$ 表示 $\mathbb{R}^d$ 的範數（Norm），而 $\left\lVert \cdot \right\rVert_$ 則是對偶範數（Dual Norm），定義如下：
$$
\left\lVert u \right\rVert_ = \sup_{x \in \mathbb{R}^d: \left\lVert x \right\rVert = 1} x^Tu.
$$
讓 $f: \overline{\mathcal{D}} \rightarrow \mathbb{R}$ 是個凸函數，則 $f$ 的 Legendre-Fenchel 變換定義如下：
$$
f^(u) = \sup_{x \in \overline{\mathcal{D}}} \left( x^Tu - f(x) \right).
$$
例如，若 $f(x) = \frac{1}{2} \left\lVert x \right\rVert^2$，則 $f^(u) = \frac{1}{2} \left\lVert u \right\rVert_*^2$。而一任意連續函數 $F:\overline{\mathcal{D}} \rightarrow \mathbb{R}$ 稱為 **Legendre **函數，則：

$F$ 是嚴格凸函數且在 $\mathcal{D}$ 存在連續一次偏微
$\lim_{x \rightarrow \overline{\mathcal{D}} \backslash \mathcal{D}} \left\lVert \nabla F(x) \right\rVert = + \infty$

Bregman Divergence

Bergman 散度 $D_F: \overline{\mathcal{D}} \times \mathcal{D}$ 對應的 Legendre 函數 $F$ 定義如下：
$$
D_F(x, y) = F(x) - F(y) - (x - y)^T \nabla F(y)
$$
除此之外，$\mathcal{D}^* = \nabla F(\mathcal{D})$ 則為 $\mathcal{D}$ 在 $F$ 下的對偶空間（Dual Space）。

Mirror Descent

Mirror Descent 的步驟如下：
$$
\begin{align}
w_{t+1} &= \nabla F^* \left( \nabla F(a_t) - \eta \nabla \lambda(a_t, x) \right) \
a_{t+1} &= \mathop{\arg\min}{a \in \mathcal{A}} D_F(a, w{t+1})
\end{align}
$$

其中 $w_i$ 為 $h$ 的參數（parameters），我們利用更新 $w$ 來解 $\mathop{\arg\min}{h \in \mathcal{H}} \sum{i=1}^N \lambda(h, (x_i, y_i))$ 問題。